operator: KBS API for LUKS key registration#248
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iroykaufman The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Jakob-Naucke
left a comment
There was a problem hiding this comment.
Thank you for creating this, and thank you for already creating a test. Make sure that linting & build/unit tests pass.
| pub async fn launch_trustee_sync_controller(client: Client) { | ||
| let deployments: Api<Deployment> = Api::default_namespaced(client.clone()); | ||
| let watcher_config = watcher::Config { | ||
| label_selector: Some("app=kbs".to_string()), |
There was a problem hiding this comment.
nit: maybe a constant also used in generate_kbs_{service,deployment} is better
| use serde::{Serialize, Serializer}; | ||
| use serde_json::{Value::String as JsonString, json}; | ||
| use std::collections::BTreeMap; | ||
| use kbs_client; |
There was a problem hiding this comment.
Is this going to be https://github.com/confidential-containers/trustee/tree/main/tools/kbs-client?
dc726d8 to
c5ec81d
Compare
|
The CI is failing because we need to have some Perl dependency. I solved this by adding this |
|
@iroykaufman these tests run in (Ubuntu) GHA containers that aren't affected by the Containerfile (this would get picked up in integration tests once I manage to fix them). You'll need to add them to the steps in steps:
- name: "Install OpenSSL dependencies"
run: apt-get install -y … |
|
@iroykaufman whoops, you can set your own build container, and we do, and it's a Fedora container, so it should be |
|
@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first. e: before the next push you can also still look into the lint failures |
Thanks, let me know and then I'll push with the lint fix |
|
@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT? |
Yes. I got confused by first thinking we were on GHA's Ubuntu containers, but the buildroot is the better place. |
|
and the CI host does need these to compile the integration tests, which is slightly not how I thought cargo dependencies worked, but alas, these packages are now installed |
- Right before the trustee first deploys the operator, create an auth key for the KBS API and save it as a kube secret. - Replacing the patch mechanism with API calls - Add a reconcile loop that watches for changes in the trustee deployment and sync LUKS key that was lost. Test: test_luks_key_sync - this test checks that the luks key is first sent to tustee and validates that after trustee restarts, the keys are sent again. Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
|
I deleted the changes for the github workflow and opened PR#9. When this is merged, the CI should work |
|
@iroykaufman: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@sourcery-ai review |
Reviewer's GuideImplements KBS API–based LUKS key management so trustee no longer needs its deployment patched on each key update, introduces an auth keypair/secret for KBS, adds a controller to resync machine LUKS keys when trustee pods restart, switches KBS storage to kvstorage, and adds an end‑to‑end test covering LUKS key sync and deletion. Sequence diagram for LUKS key registration via KBS APIsequenceDiagram
actor User as ClusterOperator
participant K8s as KubernetesAPI
participant RegSrv as RegisterServerController
participant Trustee as TrusteeModule
participant KBS as KBSService
User->>K8s: Create/Update Machine
K8s-->>RegSrv: Machine event
RegSrv->>RegSrv: keygen_reconcile
RegSrv->>Trustee: generate_secret(client, id, owner_reference)
Trustee->>K8s: Create Secret id (root key)
K8s-->>Trustee: Secret id created
RegSrv->>Trustee: send_secret(client, id)
Trustee->>K8s: Get Secret id
K8s-->>Trustee: Secret id (root data)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
KBS-->>Trustee: 201 Created
Trustee-->>RegSrv: Ok
RegSrv-->>K8s: Reconcile result Action::await_change()
Sequence diagram for LUKS key resync on trustee restartsequenceDiagram
participant K8s as KubernetesAPI
participant SyncCtl as TrusteeSyncController
participant TrusteeDep as TrusteeDeployment
participant Trustee as TrusteeModule
participant KBS as KBSService
Note over TrusteeDep,K8s: TrusteeDeployment restarted
K8s-->>SyncCtl: Deployment event (label app=kbs)
SyncCtl->>SyncCtl: trustee_deployment_reconcile
SyncCtl->>TrusteeDep: Read status (ready_replicas, replicas)
TrusteeDep-->>SyncCtl: Status ready
SyncCtl->>Trustee: sync_all_machine_luks_key(client)
Trustee->>K8s: List Secrets (default namespace)
K8s-->>Trustee: Secrets owned by Machines
loop For each machine Secret id
Trustee->>Trustee: send_secret(client, id)
Trustee->>K8s: Get Secret id
K8s-->>Trustee: Secret id (root data)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
KBS-->>Trustee: Response
end
Trustee-->>SyncCtl: Ok
SyncCtl-->>K8s: Action::await_change()
Sequence diagram for LUKS key deletion via KBS APIsequenceDiagram
participant K8s as KubernetesAPI
participant RegSrv as RegisterServerController
participant Trustee as TrusteeModule
participant KBS as KBSService
K8s-->>RegSrv: Machine deletion event
RegSrv->>RegSrv: keygen_reconcile finalizer
RegSrv->>Trustee: delete_secret(client, id)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: delete_resource(url, auth_key, path, [])
KBS-->>Trustee: 200 OK
Trustee-->>RegSrv: Ok
RegSrv-->>K8s: Finalizer completed (Action::await_change())
Class diagram for new Ed25519 key pair and trustee auth secret generationclassDiagram
class Ed25519KeyPair {
+Vec~u8~ private_key_pem
+Vec~u8~ public_key_pem
}
class TrusteeModule {
+generate_ed25519_key_pair() Result~Ed25519KeyPair~
+generate_trustee_auth_keys_secret(client, owner_reference) Result~()~
+get_auth_key(client) Result~String~
+send_secret(client, id) Result~()~
+delete_secret(client, id) Result~()~
+sync_all_machine_luks_key(client) Result~()~
+launch_trustee_sync_controller(client) void
}
TrusteeModule --> Ed25519KeyPair : creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
launch_trustee_sync_controller, the watcherlabel_selectoris set as"app={KBS_LABEL_SELECTOR}", which will not expand the constant and thus never match your KBS pods; this should be built withformat!("app={KBS_LABEL_SELECTOR}")or similar. - The new
test_luks_key_syncrelies on grepping operator logs for specific hard-coded message substrings (including the exact count"Syncing 2 machine luks key to KBS"), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `launch_trustee_sync_controller`, the watcher `label_selector` is set as `"app={KBS_LABEL_SELECTOR}"`, which will not expand the constant and thus never match your KBS pods; this should be built with `format!("app={KBS_LABEL_SELECTOR}")` or similar.
- The new `test_luks_key_sync` relies on grepping operator logs for specific hard-coded message substrings (including the exact count `"Syncing 2 machine luks key to KBS"`), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
## Individual Comments
### Comment 1
<location path="operator/src/trustee.rs" line_range="244" />
<code_context>
+pub async fn launch_trustee_sync_controller(client: Client) {
+ let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
+ let watcher_config = watcher::Config {
+ label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()),
+ ..Default::default()
+ };
</code_context>
<issue_to_address>
**issue (bug_risk):** The label selector string is not interpolating `KBS_LABEL_SELECTOR` and will not match any pods.
Because the selector is the literal string `"app={KBS_LABEL_SELECTOR}"`, the controller will never match any `kbs`-labelled resources. Build the selector string dynamically (e.g. `format!("app={}", KBS_LABEL_SELECTOR)` or a const with the expanded value) so it actually aligns with the deployment and service labels.
</issue_to_address>
### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="604-613" />
<code_context>
+ // Delete machine1 and verify its secret is removed from both K8s and KBS
</code_context>
<issue_to_address>
**issue (testing):** Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs
Right now the test only checks the operator logs for `"Secret {id1} deleted successfully"`, which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, `secrets_api.get(&machine1_uuid).await` returns `NOT_FOUND` or fails via the Poller).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| pub async fn launch_trustee_sync_controller(client: Client) { | ||
| let deployments: Api<Deployment> = Api::default_namespaced(client.clone()); | ||
| let watcher_config = watcher::Config { | ||
| label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()), |
There was a problem hiding this comment.
issue (bug_risk): The label selector string is not interpolating KBS_LABEL_SELECTOR and will not match any pods.
Because the selector is the literal string "app={KBS_LABEL_SELECTOR}", the controller will never match any kbs-labelled resources. Build the selector string dynamically (e.g. format!("app={}", KBS_LABEL_SELECTOR) or a const with the expanded value) so it actually aligns with the deployment and service labels.
| // Delete machine1 and verify its secret is removed from both K8s and KBS | ||
| machines | ||
| .delete(&machine1_name, &Default::default()) | ||
| .await?; | ||
| test_ctx.info(format!("Deleted Machine 1: {machine1_name}")); | ||
|
|
||
| let poller = Poller::new() | ||
| .with_timeout(Duration::from_secs(60)) | ||
| .with_interval(Duration::from_secs(2)) | ||
| .with_error_message("Machine1 secret not deleted from KBS".to_string()); |
There was a problem hiding this comment.
issue (testing): Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs
Right now the test only checks the operator logs for "Secret {id1} deleted successfully", which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, secrets_api.get(&machine1_uuid).await returns NOT_FOUND or fails via the Poller).
Currently, every time the LUKS key is updated, the operator patches the trustee deployment, which causes a restart of the pod. This PR introduces a way to avoid this by setting the LUKS key using the KBS API.
Core implementation points:
Tests:
Summary by Sourcery
Integrate KBS API–based LUKS key management for trustee, replacing deployment patching with authenticated API calls and adding automatic resync on trustee restarts.
New Features:
Enhancements:
Build:
Tests: